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Abstract. Leakage of confidential information represents a serious security risk. Despite 
a number of novel, theoretical advances, it has been unclear if and how quantitative ap- 
proaches to measuring leakage of confidential information could be applied to substantial, 
real- world programs. This is mostly due to the high complexity of computing precise leak- 
age quantities. In this paper, we introduce a technique which makes it possible to decide 

C^l ' if a program conforms to a quantitative policy which scales to large state-spaces with the 

help of bounded model checking. 

Our technique is applied to a number of officially reported information leak vulnerab- 
ilities in the Linux Kernel. Additionally, we also analysed authentication routines in the 

^^D , Secure Remote Password suite and of a Internet Message Support Protocol implement- 

ation. Our technique shows when there is unacceptable leakage; the same technique is 
also used to verify, for the first time, that the applied software patches indeed plug the 
information leaks. 

This is the first demonstration of quantitative information flow addressing security 
c/j ' concerns of real-world industrial programs. 

o ' 






>• ■ 1. Introduction 

oo : 

Quantitative Information Flow (QIF) [3J [11] aims to provide techniques and tools able 
to quantify leakage of confidential information. As a motivating example consider a proto- 
typical password checking program 
f— ^ ■ if (password==guess) access=l; else access=0; 

Notice how there is an unavoidable leakage of confidential information in this program: an 
attacker observing the value of access will be able to infer if he guessed the right password 
(complete leakage if he did guess it right) and if the guess was wrong he will have eliminated 
one possibility from the search space. Notice also how essential the amount of information 
leaked is: if the amount leaked is very small then the program could as well be considered 
secure. 

If, as the above example illustrates, leakage is somehow unavoidable then the real question 
is not whether or not programs leak, but how much. This point is what makes Quantitative 
Information Flow an appealing theory. In a nutshell, QIF aims to measure the amount of 
information from confidential data (in the above example the variable password) that an 
attacker who can read/write the public input data (guess) will be able to infer from some 
observable variable (access). 

However, implementing a precise QIF analysis for secret sizes of more than a few bits 
is computationally infeasible; roughly speaking this is because classical QIF computes the 
entropy of a random variable whose complexity is the same as computing all possible runs of 
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the program. Even when abstraction techniques and statistical sampling are integrated with 
QIF [9] to help the scalability issues a useful analysis for real code still seems problematic. 

In this paper, we introduce a useful quantitative analysis for C code: we will demonstrate 
the analysis on reported information leakage vulnerabilities in the Linux Kernel and the 
PHP interpreter. All of the covered vulnerabilities are referenced by the standardised 
vulnerability repository CVE from Mitrqj. 

To address the computational feasibility of the quantitative analysis we shift the focus 
from the question "How much does it leak?" to the simpler quantitative question "Does it 
leak more than k?" . We will show how the questions are related and more importantly we 
will show that off-the-shelf symbolic model checkers like CBMC [5] are able to efficiently 
answer the second kind of question. CBMC is a good choice for several reasons: (i) it makes 
it easy to parse and analyse large ANSI-C based projects (ii) it models bit-vector semantics 
of C accurately which makes it able to detect arithmetic overflows amongst others, which 
turns out to be important (hi) nondeterministic choice functions can be used to easily model 
user input, which also enjoys efficient solving due to the symbolic nature of the model checker 
(iv) despite being a bounded model checker, CBMC can check whether enough unwinding 
of the transition system were performed to prove that there are no deeper counterexamples. 

Our experiments show that the analysis not only quantifies the leakage but also helps in 
understanding the nature of the leak. In particular, the counterexample produced by the 
model checker, when a leakage property is violated, can provide insights into the cause of 
the leak. For example, we can extract a public user input from the counterexample needed 
to trigger a violation. 

Another surprising result of our experiment is that in certain circumstances we were able 
to use our technique to prove whether the official patch provided for the vulnerability does 
actually eliminate the information leak. This is achieved by point (iv) from above, when 
the model checking process is actually complete. 

In summary the main technical contributions of this paper are the following: 

(1) We present the first quantitative leakage analysis of systems software. 

(2) We show how to express Quantitative Information Flow properties that can be 
efficiently checked using bounded symbolic model checking. 

(3) We show that the technique not only quantifies leakage in real code but also provides 
valuable information about the nature of the leak. 

(4) In some cases we are able to prove that official patches for reported vulnerability 
do indeed eliminate leakage; these constitute the first positive proofs of absence of 
QIF vulnerabilities for real-world systems programs. 

2. Model of Programs and Distinctions 

We aim to model the input/output behaviour of a C function where inputs are formal 
arguments to the function and outputs are either return values or pointer arguments. 

In the following we will consider P to be a C function taking high and low inputs noted 
h, I; we call observables low variables whose values are "publicly available" after running P. 
As an example consider the following "modulo" program 
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O = (h % 4 ) + 1 

and suppose h is a 4 bits variable with values 0..15 and 1 a 1 bit variable with values 0,1; 
then the low input for P is the variable I and the observable is the variable o whose possible 
values are 0..5. 

Formally, a program P is modelled as transition system TS = (S, T, I, F) with S being 
the program states, T C S x S are the program transitions and I the initial states and F 
the final states. Let us define a successor function for a state s G S 

Post(s) = {s' £S | (s,s') GT} 

A state s is in F if Post(s) = 0. A path is a finite sequence of states it = soSis 2 ■ ■ ■ s n such 
that so & I and s n E F. 

A state is a tuple S = Sh x Sl of the pair of confidential input H and low input L. We 
consider initial/final or input/output pairs of states of a path, ((h,l),o) where the second 
component is the output o produced by the final state drawn from some output alphabet O. 
In the above example an input/output pair would be ((5, 1), 2) representing the computation 
(5%4) + l = 2. 

Confidential inputs are denoted as h £ H, low inputs I € L, and low observations o € O, 
where the output behaviour of the function is always a low observation and the input is an 
initial state (h,l). A distinction on the confidential input through observations O is one 
where there exists at least two paths through P, modelled as TS, which leads to different 
observations for different confidential input but constant low input. 

We define an equivalence relation —pj on the values of the high variables as follows: 
h ~p,z h' iff if ((h, /), o), {{h! , l),o') are input/output pairs in P then o = d ' . 

Hence, two high values are equivalent (w.r.t. a low value I) if they cannot be distinguished 
by any observable. In the running example an equivalence class in ~ p\ would for example 
be {1,5,9,13}. The equivalence relation associated to P, I is an element of the set of all 
possible equivalence relation on the values of high. 

Let I(Sh) as the set of all possible equivalence relations on a set Sh- Define on Z{Sh) 
the order: 

(1) ~E~<^ Vsi, s 2 (s\ ~ s 2 =>■ si « s 2 ) 

where w, ~ € Z{Sh) and s\, s 2 £ Sh- E defines a complete lattice over Sh- It is a refinement 
order with bottom element being the relation relating every state and top element being 
the identity relation. This is described as the Lattice of Information |10j . 

Non leaking programs (i.e. satisfying non-interference [7]) are characterised as follows: 

Proposition 1. P is non-interfering iff for all I, —pi is the least element inZ(Sn) ■ 

An attacker controlling the low inputs can be modelled by an equivalence relation ~p 
corresponding to a particular ~p^. 

Formally, we define a quantitative policy as a non-negative natural number N. A relation 
—Pl breaches a policy if | —pi \ > N (where | —pi \ is the number of equivalence classes of 
—Pl). In our model, an attacker will always choose a relation breaching the policy, provided 
that given a policy and a program such a relation exists. We use ~p with the program P 
being initialised with the attacker's choice of o 
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In the above example, a choice could be ~p = — po corresponding to the program 1=0 ; 
o = (h % 4 ) + 1. 

Quantitative Information Flow uses information theoretical measures like Shannon en- 
tropy to measure leakage of confidential information. The measure of a program can be 
broken down into two main steps [11| |8] : 

(1) interpret the program as a random variable Rp 

(2) compute the entropy of Rp (noted H(Rp)) 

It has been shown that Rp and ~p coincide [111 [13] . For example for the modulo program 
above under the assumption of uniform distribution on the input there are 4 equivalence 
classes each having probability -r. The Shannon entropy of that program is then 

4*-^log 2 (^) = 2 

This number 2 represents the fact that the observations reveal which of the 4 possible classes 
(i.e. 2 bits of information) the high input belongs to. 

Rp and ~p are also order related as the following proposition shows [8]: 

Proposition 2. ~p IZ ~p/ iff for all probability distributions H{Rp) < H(Rp>) 

To further understand the importance of ~p in Quantitative Information Flow we need 
to introduce the information theoretical concept of channel capacity: consider the password 
check example from the introduction. Suppose the password is a 64 bits randomly chosen 
string; we have two equivalence classes, one with 1 element so having probability ^et, the 
other class with 2 64 — 1 elements having thus probability 1 — ^bt- The entropy is then 
3.46944695 x 1CP 18 : as expected a password check of a big password should leak very little. 
Suppose however that the probabilities of the high inputs are such that both equivalence 
classes have probability ^. Then the entropy dramatically raises to 1 which is the channel 
capacity, i.e. the maximum leakage achievable given two classes: log 2 (2) = log 2 (| — p |). In 
the modulo example the channel capacity is 2 which happens to be given by the uniform 
distribution on the high input. Other distributions on the high input cannot give higher 
entropy: for example if we consider the distribution where all even numbers have equal 
probability „, and all odd numbers have probability then the resulting entropy will be 1. 

The following result establishes basic relationships between leakage, channel capacity, 
and number of distinctions: 

Proposition 3. 

(1) P is non-interfering iff log 2 (| — p |) = 

(2) The channel capacity of P is log 2 (| ~p |) . 

(3) If for all probability distributions H(Rp) < H(Rpr) then | ~p ( < | ~p> 

Point (1) is proved in [4], (2) in |12j and (3) is a consequence of proposition [2] whose proof 
is in [8J. Hence a lower bound on | ~p | provides a lower bound on the channel capacity of 
the program P. 



The channel capacity is the maximum possible leakage where we consider all possible probability distri- 
butions on the inputs |12] 
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Hence, because of proposition [3] the inequality | ~p | > N, which is at the heart of our 
analysis, can be rephrased to the following statement: in a setting where the distribution of 
the secret is the most favourable for the attacker then the leakage is at least log 2 (iV) bits. 

3. Encoding Distinction-Based Quant Policies 

Recall that for a program P a quantitative policy is a natural number N which limits 
the cardinality of ~p to N. 

In other words, a program violates a quantitative policy if it makes more distinctions 
than what is allowed in the policy. A leaking programs is one breaching the policy N = 1 
in the above definition. 

We take ideas from assume-guarantee reasoning [17] to encode such a policy in a driver 
function, which tries to trigger a violation, i.e. producing a counterexample, of the policy. 
If the policy states that the function f unc is not allowed to make more than 2 distinctions 
then this is modelled as shown in Program [TJ This driver only has a high component as a 
state, which is passed to the function f unc where the policy is tested on. 



int 


hl,h2,h3; 










int 


ol,o2,o3; 










hi 


= input () ; 


h2 


= input () ; 


h3 = 


= input ( ) ; 


ol 


= f unc (hi) 










o2 


= func(h2) 










assume (ol != ( 


>2); 


// (A) 






o3 


= func(h3) 










ass 


ert(o3 == < 


>1 II 


o3 == o2) 


; // 


(B) 



Program 1: Example driver checking for 2 distinctions 

Drivers always have a similar structure: we model the secret by a nondeterministic choice 
function input () as a placeholder for all possible values of that type; then for a policy of 
checking for N distinctions, the function under inspection is called N times. The crucial 
step (A) is the use of the assume statement after the calls: the driver assumes that, in this 
case, there are two different return values found already. The function is called an iV + 1th 
time and at (B) the driver asserts that the next output is either one of the previously found 
outputs. 

The assume statement only considers execution paths which satisfy the given boolean 
formula, all other paths are rejected. Further, the bounded model checker used will try to 
find a counterexample to the negated assertion claim, which is only satisfiable if and only 
if a counterexample exists. If it is unsatisfiable means that the original claim holds, i.e. 
the program conforms to the policy. The verification condition generated by the bounded 
model checker for the policy in Program [1] is: 

ol != o2 =>- (o3 == ol || o3 == o2) 

Where the bounded model checker tries to find a counterexample (execution path) using 
the negated claim such that the following holds 
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Input: Function func, types t,t' ,t' ' 

Output: Driver. c 

t o_l, ..., o_n, o_n+l; 

t' h_l, ..., h_n, h_n+l; 

t" 1; 


comparison eq. 


-t, 


bound k, 


threshold N 


h_l = input (); ... h_n = 

1 = input ( ) ; 

o_l = func(h_l, 1) ; 


input () ; 












o_n = func(h_n, 1); 
assume (! eq_t(o_l , o 


_2) 


Sc& !eq_t(o 


-1, o 


_3) && .. 


.) 


) 




o_n+l = func(h_n+l, 
assert (eq_t (o_n+l , 


i); 

o_l) 


1 1 eq_t(o 


_n+l, 


o_2) I I 




•); 





Algorithm 1: Template to syntactically generate a driver for an N distinction policy 

ol != o2 A o3 != ol A o3 ! = o2 

i.e. that there are three distinctions possible. 

Another possibility is that the function func does not even make two distinctions, such 
that the assume statement at point (A) is always false, which leads to proving the policy 
(or any policy) vacuously true, because for any assertion Q the verification condition is true, 
i.e. false ==£- Q. 

3.1. Bounded Model Checking. We use the bounded model checker CBMC to verify 
or falsify a policy. CBMC encodes an ANSI-C program into a propositional formula by 
unwinding the transition relation and user defined specifications up to some bound. This 
formula is only satisfiable if there exists an error trace violating the specification. 

The tool can also check if the unwinding bound is sufficient by introducing unwinding 
assertions, which are assertions on the negated loop guards. This ensures that no longer 
counterexample can exist than the used bound. To prove any properties the analysis has to 
pass unwinding assertions, otherwise it can only be used as a way to find counterexamples 
up to the unwinding bound. 

The C program gets encoded into constraints C and the property - user defined assertions 
- are encoded in P. Then the model checker tries to find a satisfiable assignment to the 
formula 

C A^P 

where P is an accumulation of the assumptions and assertions made in the program text. 
Thus if there are two assume statements in the driver with expressions E\ and E 2 and one 
assert statement with expression Q then P is 

P = E 1 AE 2 => Q 

3.2. Driver. A general template for a driver is described in Algorithm [H The inputs to 
the algorithm are the function func to be analysed, possibly up to three different types for 
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the input/output pair ((h,l),o), and a comparison function eq_t which returns true if the 
arguments of type t are equal, where t is the type of the observation of function f unc. This 
comparison function could be as simple as == of C, or a more complex function, such as 
memcmp, if t is an array or string. Also note, that the observations o_i do not need to be 
only return values, but can also be pointer arguments to func. 

Proposition 4 (Correctness of driver template). If the driver template in Algorithm^ is 
successfully verified up to a bound k (i. e. the negated claim is unsatisfiable) then the function 
func does not make more than N distinctions on the output within the bound k. Formally, 
we state that the validity of the driver implies the validity of the following implication 

01 / o 2 A oi 7^ 03 A • • • A o n _i ^ o n => o n+ i = 0\ V • • • V o n+ i = o n 

Thus, we can make the following claims on the result of the model checking process: For 
a given bound k and a policy, 

• if the model checker finds a counterexample then the policy is violated, i.e. the 
program makes more distinctions than specified 

• if the process ends with a successful verification of the policy without unwinding 
assertions then the policy holds up to an unwinding of k. 

• if the process ends with a successful verification of the policy with unwinding asser- 
tions then the policy holds for any number of iterations. 

4. Checking Quantitative Policies 

The steps in checking a program or function for the compliance with a quantitative policy 
are as follows: (1) Define the input state (h, I) and output state o in the code, i.e. the 
confidential input h, the low input I and the observation o (2) Define the maximum number 
of distinctions in the policy and an unwinding factor k (3) Generate a driver function using 
the template in Algorithm Q] (4) Run CBMC on the driver. If the driver is successfully 
verified, potentially increase the unwinding factor. 

4.1. Modelling Low Input. A crucial aspect of our analysis is to model low user input, 
which is most of the time responsible for triggering a bug which causes the information 
leak. These bugs only happen on a very restricted number of execution paths and could be 
exploited by a malicious user choosing a special user input. This scenario generally applies 
when studying many CVE reported information leakage vulnerabilities. 

Let us look at the following simplified code in Program [21 which contains an integer 
underflow, taken from the vulnerability CVE-2007-2875 in the linux kernel. 

At first, it seems not possible that the point (C) where the secret h gets returned is 
ever executed, because exactly that check is done in (A) which reduces the variable nbytes 
to be within the bound buf sz. However, due to wrong choice and combination of types, 
the subtraction in (B) causes an underflow in nbytes for a very large ppos value. And 
unfortunately, ppos is a user controlled input variable, such that when its value is chosen 
correctly, point (C) is reached. 

In this case, a state in the system is the tuple (h, I) which represents the arguments to 
the function underflow, i.e. the formal parameters h and ppos; observations are the return 
values of this function. The generated driver can automatically find the low part of a state 
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typedef long long loff_t; 

typedef unsigned int size_t; 

int underf low(int h, loff_t ppos) { 

int bufsz; 

size_t nbytes; 

bufsz=1024; 

nbytes=20; 

if (ppos + nbytes > bufsz) // (A) 

nbytes = bufsz - ppos; // (B) 
if (ppos + nbytes > bufsz) { 

return h; // (C) 
} else { 

return ; 
} 



Program 2: Integer underflow causing a leak 

which triggers such subsequent information leaks, because the analysis instructs the model 
checker to find any possible execution path satisfying the assumptions and assertions on 
the outputs, given nondeterministic high values and fixed low inputs. As SAT-based model 
checking is precise down to the individual bit, it will find a low input which triggers the 
underflow and uncovers the leak. 

CBMC generates a counterexample falsifying a policy of e.g. no leakage and thereby 
having triggered the integer underflow. The following excerpt of the counterexample 
State 14 file underflow. c line 40 function main thread 



underflow: :main: :1: : 1=1706688912 (00000000. . . 
State 35 file underflow. c line 13 function underflow thread 

underflow: : underflow: :1: :nbytes=4027596816 (11110000. . . 
shows that a low input of 1=1706688912 lead to an nbytes which underflowed from the 
previous value 20. 

Clearly, for such leaks to be detected it needs bit-level precise reasoning, just like SAT- 
based bounded model checkers support. 

4.2. Environment. In model checking, the environment, like library function calls or gen- 
erally functions and data structures which have no implementation, need to be modelled 
in a way which allows for the property to be verified. Out of the box, CBMC replaces 
function calls with no implementation with nondeterministic values. 

As our analysis needs to check for equality on inputs and outputs of functions a certain 
number of common library function have to be modelled in a way which preserves their 
original semantics. For example, the usual library C functions memcmp, and strcmp are 
implemented in a way which return if their arguments are equal and a value not equal to 
if they are not equal. The functions memset and memcpy actually set an array of integers 
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Description 


CVE Bulletin 


LOC 


k* 


Patch Proof 


log 2 (N) 


Time 


AppleTalk 


CVE-2009-3002 


237 


G4 


/ 


6 bit 


lh39m 


tcLfilLnode 


CVE-2009-3612 


146 


G4 


/ 


6 bit 


3m34s 


sigaltstack 


CVE-2009-2847 


199 


128 


/ 


7 bit 


49m50s 


cpusett 


CVE-2007-2875 


63 


64 


X 


6 bit 


lm32s 



SRP getpass 
login_unix 



93 
128 



• 



1 bit 

2 bit 



0.128s 
8.364s 



Table 1. Experimental Results. * Number of unwindings f From Section 14.11 



or characters to a certain value or to the content of another array. The same applies to 
linux kernel utility functions such as copy_to_user and copy_f rom_user which copy memory 
blocks to or from userspace. 

For example, a memcmp implementation is shown in Program O 



int memcmp(char *sl, char *s2, unsigned int n) { 

int i; 

for(i=0;i<n;i++) { 

if(sl[i] != s2[i]) return -1; 

> 

return 0; 
} 



Program 3: Simplified memcmp model 



5. Experimental Results 

We applied our technique to CVE reported information leakage vulnerabilities in the 
Linux Kernel. In the experiments we checked for policy violations and proved whether 
official patches resolve the information leakage. We also analysed authentication routines of 
the Secure Remote Password protocol (SRP) and of a Internet Message Support Protocol 
implementation. A summary of the results is shown in Table [TJ 

5.1. Linux Kernel. We define information leakage in the kernel always as parts of the 
kernel memory which gets mistakenly copied to user space, i.e. the virtual memory allocated 
to conventional applications. Clearly, this should not happen as anything allocated in the 
kernel space is not meant to be seen by users (except within the bounds of normal user/kernel 
interactions), especially in multi-user systems like Linux. Thus, in all examples the kernel 
memory is modelled as nondeterministic values. 

The interface between user and kernel space are system calls or syscalls in short. Syscalls, 
like normal functions, have a number of arguments and a return value where the kernel can 
transfer data structures or single values back and forth. This is the crucial point in the 
system where information leakage is most common. 

AppleTalk. The specific vulnerability CVE-2009-3002 in the appletalk network code 
shows a quite common cause of information leakage: a user requests, by a syscall, that a 
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structure gets filled with values and returned to user land. The developer however forgot to 
assign values to all fields in the struct, thus these missing fields get "filled" with unspecified 
kernel memory, as it is allocated on the stack. This CVE security bulletin actually comprises 
six different vulnerable network protocol implementations, all following the same leakage 
pattern. We will only present the affected code of the AppleTalk implementation - the 
same kind of analysis applies to all six vulnerabilities. 

In this case the structure returned to the user is shown in Program SJ The leaking 



struct sockaddr_at { 

u_char sat_len, sat_family, sat_port; 
struct at_addr sat_addr; 

union { 

struct netrange r_netrange ; 

char r_zero [ 8 ] ; 

} sat_range ; 

>; 

#define sat_zero sat_range .r_zero 



Program 4: Complex observation struct leads to leak from sat_zero. 
function is atalk_getname in net/appletalk/ddp . c is shown in Program 



static int atalk_getname (struct socket *sock, struct sockaddr *uaddr, 

int *uaddr_len, int peer) { 
struct sockaddr_at sat; 

// Official Patch. Uncommented to trigger leak 
//memset (&sat . sat_zero , 0, sizeof (sat . sat_zero)) ; 

: // sat structure gets filled 
memcpy (uaddr , fesat, sizeof (sat) ) ; 
return ; 



Program 5: Function introducing the leak for CVE-2009-3002. 

In the function, the structure sat gets filled with values provided by the kernel, at the 
end the whole structure is copied via memcpy to the address of the uaddr pointer, which 
is indirectly, via the syscall getsockname copied back to user land. However, the field 
sat . sat jzero has not been initialised, thus a number of bytes of kernel memory get copied 
back to the user. 

The secret is implicitly modelled by allocating the sat structure with nondeterministic 
values; observations are also of type sockaddr_at. The driver uses as parameter eq_t the 
library function memcmp to compare memories. 

The model checker found a counterexample for a 6 bit policy within 1 hour and 39 
minutes. Once the official patch was applied of setting the sat structure to with memset, 
our driver successfully verified the policy in about the same time with unwinding assertions, 
thus it proved that the patch stops the leak. 
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tcf_fill_node. This information leak happens in the netlink subsystem of the kernel. 
The function tcf _f illjiode prepares a struct tcmsg to be sent back to the user. However, 
the programmer made a typing mistake and filled a field tcm__padl twice instead of the 
second time for tcm__pad2. 



struct tcmsg *tcm; 














nlh 


= NLMSG_NEW(skb, p: 


Ld, se 


q, event, 


sizec 


f Otcm) , 


flags) ; 


tcm 


= NLMSG_DATA(nlh); 














tcm- 


->tcm_family = AF_UNSPEC; 












tcm- 


->tcm padl = 0; 














tcm- 


->tcm padl = 0; // 


typo, 


should be 


tcm_ 


_pad2 


instead. 



Program 6: Function excerpt introducing the leak for CVE-2009-3612. 

This leaks kernel memory from tcm__pad2 back to userspace. Here, we again modelled 
kernel memory implicitly by the memory allocated for tcm through the function NLMSG_DATA, 
which initialised the fields of the struct with nondeterministic values. The observation is 
the filled out variable tcm, the low user input is a simple integer variable not mentioned 
here for brevity. 

The official patch which was applied to fix the leak is simply changing the last line above 
to tcm->tcm__pad2=0. We were again able to prove that this patch successfully fixes the 
security hole and otherwise the program violates a leakage policy of 6 bits. 

Without the patch, a counterexample is found within 3 minutes and 34 seconds; with the 
patch, the program is verified within about the same time. 

sigaltstack. The leakage for this vulnerability is intricate and only manifests itself on 
64-bit processors. On such a system, the struct stack_t, as shown in Program [71 will be 
padded to a multiple of 8 bytes because on 64-bit systems void* and size_t are both 8 
bytes (instead of 4 bytes for 32-bit systems), while an integer type remains 4 bytes. Thus, 
the size of stack_t is padded to 24 bytes, while on a 32-bit system it remains unpadded at 
12 bytes. 



typedef 


struct 
void 


sigaltstack { 
.user *ss_sp; 














int ss. 


.flags; // 4 bytes 


padding 


on 


64- 


-bit 


systems 




size_t 


ss_size ; 












} stack 


-t; 















Program 7: Structure with padding depending on architecture. 

The syscall do_sigaltstack in kernel/signal . c copies such a structure back to userland 
via the copy function copy_to_user, however it does not clear the padding bytes, thus those 
are leaked to the user on a 64-bit system. In the function visible in Program [51 the high 
input is the structure oss and the low output is the argument uoss. 

CBMC supports modelling of 64-bit widths however that is not enough to automatically 
measure the padding bytes. This is because the sizeof operator in CBMC returns only 
the sum of all sizes without eventual bit alignments. This is solved in our approach by 
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int do_sigaltstack (const stack_t user *uss, stack_t 


user *uoss, 


unsigned long sp) { 




stack_t oss; 




... // oss fields get filled 




if (copy_to_user (uoss , toss, sizeof (oss) ) ) 




goto out ; .... 





Program 8: Leakage through copying whole structures including padding. 

providing a model of the copy_to_user function, just like e.g. an implementation of memcpy 
is provided, which checks if the length parameter is aligned according to the architecture (4 
bytes for 32 and 8 bytes for 64). If there are padding alignments then these will be chosen 
to be filled with nondeterministic integer values modulo the number of padding bytes. 

In Program [8l this would translate to the following: sizeof (oss) counts 20 bytes as 
the size of the structure. However, this does not account for the padding bytes, and our 
copy_to_user model does the following calculation: 

pad = ALIGN - (sizeof (oss) '/, ALIGN); 

if (pad == ALIGN) padding = 0; 

else padding = ((unsigned int) nondet_int () ) 7, (1 << (pad*8)) 

where ALIGN is chosen to be 4 or 8 depending on the architecture used. In a 64-bit system, 
this translates to 8 — (20%8) = 4 bytes for pad which are represented by the padding 
variable. 

With this setup, we were able to verify that on a 32-bit system the Program [8] does not 
leak anything, while on a 64-bit system this violates a policy of e.g. 7 bits. A counterexample 
was found within 49 minutes and 50 seconds. We were also able to prove that the patch 
applied removes the padding leak. The patch in this case was to not copy the whole struct 
but copying the three struct members separately through the function __put_user, where 
the padding does not come into play. 

cpuset. The crucial part of this vulnerability has already been discussed in Section 
14.11 Our analysis finds the right low input which triggers the integer underflow. The actual 
code however does not simply return the secret as shown in the section mentioned above, 
but it copies nbytes number of bytes from a buffer ctr->buf at offset *ppos. Because 



if (*ppos + nbytes > ctr->bufsz) 




nbytes = ctr->bufsz - *ppos; 




if (copy_to_user (buf , ctr->buf + *ppos, 


nbytes) ) 


return -EFAULT; 





of the underflow, nbytes and *ppos access memory way out of the actual buffer and thus 
disclose kernel memory. However our analysis of this vulnerability requires at the moment 
too much manual intervention to model memory access outside of the allowed bound ( i.e. 
ctr->buf + *ppos). 

One elegant way of addressing this problem would be by modifying CBMC itself; CBMC 
could for example return nondeterministic values for such out-of-bound memory accesses 
which would implicitly model the access to confidential data. 
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5.2. Authentication Checks. We analysed parts of the authentication routines of the 
secure remote password suite (SRP) and the Unix passwd-authentication of Cyrus' Internet 
Message Support Protocol daemon (IMSPD). 

SRP. To demonstrate that confidential variables and observations can be used flexibly, 
we checked that there is no leakage in the password request function in libsrp/t_getpass . c. 

The confidential input is the password entered by the user when being prompted at the 
login; the observations are the echos of the terminal of typed characters. Whether the 
terminal echos the typed characters or not depends on which mode the console is in. The 
environment modelling the console and its modes had to be provided to check this program. 



_TYPE( int ) t_getpass (char* buf , unsigned max] 


en, 






;onst char* p 


rompt) { 


DWORD mode ; 








GetConsoleMode ( 


handle , femode ) 






SetConsoleMode ( 


handle , mode & 


"ENABLE_ECHO_ 


INPUT ) ; 


if (f puts (prompt 


stdout) == EOF 


II 




f gets (buf, maxlen, stdin) == 


= NULL) { 




SetConsoleMode (handle , mode) 


j 




return -1 ; 
} .... 









Program 9: Side-effect of mode decides on echo output of fgets 

In Program [9l the function t_getpass first gets the current mode of the console by the 
function GetConsoleMode; then it sets a new console mode by inverting the bit ENABLE_ECHO_INPUT 
in the mode through the function SetConsoleMode which clearly disables the echo of input 
read from standard input. The function GetConsoleMode is modelled by nondeterministic- 
ally setting the mode to any integer value, the function SetConsoleMode sets a global mode 
variable to its second argument. The function fgets, which reads a number of bytes from 
stdin, is modelled to return its first argument buf completely if the mode is set to echo 
the input and return a constant value otherwise. 

With this setup CBMC proves through our driver that starting from any initial mode, 
the program will always end up with log 2 (| ~p |) = 0, i.e. that there is no leakage. We can 
also successfully check that if the line which disables the echo is removed then the policy is 
violated. 

IMSPD. The function checked in this test is login_plaintext in imsp/login_unix . c 
as shown in Program 1 101 

The program first tries to receive the stored password context of a user using the function 
getpwnam. If successful, it will compare the stored with the entered password using strcmp. 
If this fails it will set the string reply to "wrong password" . If authentication is successful 
it returns 0. 

Clearly, this function has three distinguishable observations: (1) it returns 1 (2) it returns 
1 and sets *reply (3) it returns 0. We modelled the three parameters to the function as low 
user input and the stored password as confidential variable. With this setup, we are able 
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int login_plaintext (char *user, char* pass, char** reply) { 

struct passwd* pwd = getpwnam(user) ; 

if (!pwd) return 1; 

if (strcmp(pwd->pw_passwd, crypt(pass, pwd->pw_passwd) ) != 0) { 

*reply = "wrong password" ; 

return 1 ; 
} 
return ; 



Program 10: Login function of IMSPD. 

to verify that this program conforms to a policy which only leaks 3 observations, within 9 
seconds. 

6. Related Work 

There have been several attempts in recent years to build a quantitative analysis of 
leakage, starting with the static analysis in [4j. 

The most relevant works for this paper are [I] by M. Backes, B. Kopf and A. Rybalchenko 
and J8] by J. Heusser and P. Malacaria where verification techniques are used to compute 
leakage of programs. Those works are both inspired by the important previous theoretical 
work on self composition by G. Barthe, P. D'Argenio, and T. Rezk [2] and T. Terauchi and 
A. Aiken [18]. However as already noted, those approaches attempts primarily to answer 
questions about how much a program leak and seem unable to scale to real code in terms 
of line of code, state space and language constructs. In particular, they have not, as far as 
we are aware, been used to analyse independently existing vulnerabilities in independently 
existing programs. 

On the theoretical side, the complexity of QIF analysis has recently been thoroughly 
investigated by H. Yasuoka and T. Terauchi [19] who, amongst other aspects, explored the 
relation to verification and k-safety properties. 

Approaches that do scale to large programs are by S. McCamant, M. D. Ernst [14j and 
J. Newsome, S. McCamant, D. Song [15]. They released an impressive tool, FlowCheck, 
which is able to analyse very large programs. There are however significant differences 
between the approaches in that FlowCheck is a security testing tool based on the Valgrind 
dynamic instrumentation framework whereas our approach is based on verification and 
static analysis techniques. Thus, our work comes with stronger theoretical guarantees (for 
example verification of the official patches) and does not require to "run" the code. 

D. Kroening's CBMC |5j has been used for many practical applications. A good overview 
over the applied fields can be found under the following link |6j. 

7. Conclusion 

In this paper we combined state of the art model checking with theoretical work on 
Quantitative Information Flow, to provide a powerful tool for the analysis of leakage of 
information. We demonstrated not only that CVE reported vulnerabilities such as for the 
Linux kernel can be analysed with a level of scalability and precision able to find real 
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security vulnerabilities, but that it is also possible to prove whether the official patches fix 
the problem. We argued that leaks are not synonymous of a security breach and hence 
a quantitative framework is better equipped than a qualitative one to determine when an 
information leak represents a security threat. 

We see this work as a significant step in the application of academic research on inform- 
ation flow analysis to real-world problems in systems software. 
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